Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D Convolution example using global and shared memory #2228

Merged

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Jan 22, 2024

An example: A 2D Convolutional filter applied to a matrix. The values of filter-matrix were initially kept in constant memory at the first commit. But due to Gitlab pipeline error "The SYCL backend does not support global device constants"; in the second commit, constant memory usage has been removed.

  • Kernel1: Global memory is used, without tiling.
  • Kernel2: Uses tiling. Block size is assumed to be equal to tile size. First, the tile is copied to shared memory, since an element in the tile would be accessed many times. Each block works on the domain of one tile. But at the border of the tile, some external matrix values are needed ( at the border with another tile) then those matrix values are taken from the global memory.

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft January 22, 2024 11:44
@mehmetyusufoglu mehmetyusufoglu force-pushed the convolution2DExample branch 2 times, most recently from 9c27efd to aa3efcc Compare January 24, 2024 10:14
@mehmetyusufoglu mehmetyusufoglu changed the title [Wip] 2D Convolution example using global and shared memory 2D Convolution example using global and shared memory Jan 24, 2024
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review January 24, 2024 10:36
@mehmetyusufoglu mehmetyusufoglu force-pushed the convolution2DExample branch 3 times, most recently from dc160f3 to 2b93807 Compare January 24, 2024 17:16
@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft January 24, 2024 22:06
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review January 26, 2024 10:12
@mehmetyusufoglu mehmetyusufoglu force-pushed the convolution2DExample branch 9 times, most recently from ae50779 to a8ac8e6 Compare January 28, 2024 22:57
@psychocoderHPC psychocoderHPC added this to the 1.2.0 milestone Jan 29, 2024
// Allocate shared memory
auto* const sharedN = alpaka::getDynSharedMem<TElem>(acc);
// Fill shared memory of device so that tile items are accessed from shared memory
if(row < matrixHeight && col < matrixWidth && blockThreadIdx1D < blockThreadExtent.prod())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the host side you use getValidWorkDiv This means you will have one thread for some alpaka accelerators.
I know you wrote that the block size must be equal to the tile size but you do not enforce it e.g. with an ALPAKA_VERIFY
If you have only one thread in the block you can simply iterate over the shared memory to fill it.

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but if there is one thread per block it means it is not a GPU (or it is not good to use GPU); so we dont know which level of memory is used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is one block, whole block is loaded into the shared memory in the code.

example/convolution2D/src/convolution2D.cpp Show resolved Hide resolved
example/convolution2D/src/convolution2D.cpp Outdated Show resolved Hide resolved
example/convolution2D/src/convolution2D.cpp Outdated Show resolved Hide resolved
example/convolution2D/src/convolution2D.cpp Outdated Show resolved Hide resolved
@psychocoderHPC psychocoderHPC merged commit 6116586 into alpaka-group:develop Feb 28, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants